Why is the special structure of the language important for Chinese spoken language processing? - examples on spoken document retrieval, segmentation and summarization
نویسندگان
چکیده
The Chinese language is not only spoken by the largest population in the world, but quite different from many western languages with a very special structure. It is not alphabetic: large number of Chinese characters are ideographic symbols and pronounced as monosyllables. The open vocabulary nature, the flexible wording structure and the tone behavior are also good examples within the special structure. It is believed that better results and performance will be obtainable in developing Chinese spoken language processing technologies, if this special structure can be taken into account. In this paper, a set of “feature units” for Chinese spoken language processing is identified, and the retrieval, segmentation and summarization of Chinese spoken documents are taken as examples in analyzing the use of such “feature units”. Experimental results indicate that by careful considerations of the special structure and proper choice of the “feature units”, significantly better performance can be achieved.
منابع مشابه
An Investigation of Spoken Output and Intervention Types among Iranian EFL Learners
This study was inspired by VanPatten and Uludag’s (2011) study on the transferability of training via processing instruction to output tasks and Mori’s (2002) work on the development of talk-in-interaction during a group task. An interview was devised as the pretest, posttest, and delayed posttest to compare four intervention types for teaching the simple past passive: traditional intervention ...
متن کاملStructural Features of Chinese Language
Chinese language is quite different from many western languages in various structural features. It is not alphabetic. Large number of Chinese characters are ideographic symbols. The monosyllabic structure, the open vocabulary nature, the flexible wording structure with tones, and the flexibilities in word ordering are good examples of the structural features of Chinese language. It is believed ...
متن کاملExtractive Spoken Document Summarization with Representation Learning Techniques
The rapidly increasing availability of multimedia associated with spoken documents on the Internet has prompted automatic spoken document summarization to be an important research subject. Thus far, the majority of existing work has focused on extractive spoken document summarization, which selects salient sentences from an original spoken document according to a target summarization ratio and ...
متن کاملAutomatic title generation for Chinese spoken documents considering the special structure of the language
The purpose of automatic title generation is to understand a document and to summarize it with only several but readable words or phrases. It is important for browsing and retrieving spoken documents, which may be automatically transcribed, but it will be much more helpful if given the titles indicating the content subjects of the documents. On the other hand, the Chinese language is not only s...
متن کاملمقایسه روش های طیفی برای شناسایی زبان گفتاری
Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003